phar: cap decompression output against declared uncompressed_filesize by iliaal · Pull Request #21806 · php/php-src

iliaal · 2026-04-18T18:06:05Z

Prevents decompression amplification in phar_open_entry_fp().

The function decompressed entries by streaming compressed_filesize bytes through a filter into a tmpfile, then checked the output size after the full copy. A crafted phar whose compressed data expanded beyond uncompressed_filesize could write large amounts to disk before the check ran.

Replaces the single php_stream_copy_to_stream_ex() call with an 8 KiB chunked loop that flushes the decompression filter and checks the running output size after each chunk. Aborts when output exceeds uncompressed_filesize.

Girgias · 2026-04-20T15:14:31Z

-		if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, entry->compressed_filesize, NULL)) {
-			spprintf(error, 4096, "phar error: internal corruption of phar \"%s\" (actual filesize mismatch on file \"%s\")", phar->fname, entry->filename);
-			php_stream_filter_remove(filter, 1);
-			return FAILURE;
+		size_t remaining = entry->compressed_filesize;
+		while (remaining > 0) {
+			size_t chunk = remaining < 8192 ? remaining : 8192;
+			size_t n = 0;
+			if (SUCCESS != php_stream_copy_to_stream_ex(phar_get_entrypfp(entry), ufp, chunk, &n)) {
+				goto decompression_error;
+			}
+			if (n == 0) {
+				break;
+			}
+			remaining -= n;
+			php_stream_filter_flush(filter, 0);
+			if (php_stream_tell(ufp) - loc > (zend_off_t) entry->uncompressed_filesize) {
+				goto decompression_error;
+			}
 		}


I don't understand this whole logic not the point of it. Surely it suffices to pass the size_t n as a pointer and check if the copy succeeds and the size is the same as the compressed_filesize. No need for this looping nonsense?

Ouch, "looping nonsense" 😄

The n returned by php_stream_copy_to_stream_ex is bytes read from the source (see *len = haveread at main/streams/streams.c:1831), not bytes written to the destination after the decompression filter runs. Checking n == compressed_filesize just confirms we consumed the declared compressed length; it tells us nothing about how much decompressed data landed on ufp.

The underlying issue is amplification: a phar entry declares e.g. compressed_filesize=200, uncompressed_filesize=1024, but those 200 bytes decompress to gigabytes. The pre-patch code already had a post-copy size check (php_stream_tell(ufp) - loc != uncompressed_filesize), but it only fires after the single bulk copy has already flushed the full decompressed payload to tmpfile. The goal of this patch is to bound what hits disk, not to detect the mismatch.

The only signal that bounds filter output is the write position on ufp. The chunked loop checks php_stream_tell(ufp) - loc > uncompressed_filesize after each 8 KiB chunk, with php_stream_filter_flush(filter, 0) per iteration so tell() reflects the filter output rather than still-buffered bytes. This looked like the simplest solution, but maybe there are others; a single copy_to_stream_ex with an n check won't catch this.

The description of _php_stream_copy_to_stream_ex says that it sets *len to the number of bytes moved. So are you telling me this is a lie? Or is the problem is that we use a stream filter to do the decompression (which is maybe the issue...).

I'm not sure if I see the point in preventing to hit the disk if we are only writing to a temporary file. Unless this is somehow a security issue this seems to do extra work, complicate the code, for little benefits.

The docstring isn't lying, but with a write filter on the destination, "bytes moved" means bytes fed into the filter (== compressed_filesize on success), not bytes the filter emits to the underlying file. Your 2nd point is the right one: because decompression runs as a filter, *len can't report post-filter output, so the caller has no way to learn the amplified size from a single copy_to_stream_ex call.

On the security side: technically a DoS from a malicious phar is possible here, though a low-quality one given the lack of persistence. More of a bug with security implications than an advisory imho, hence filed as a bug. A phar entry with compressed_filesize=200, uncompressed_filesize=1024 whose 200 bytes decompress to several GiB will write several GiB to the tmpfile before the existing post-copy check fires. An attacker can exhaust /tmp or swap with a handful of KiB of input against any service that opens entries from an untrusted phar (composer-style tooling, CI pipelines scanning phars, upload processors). The "filesize mismatch" path was designed to flag corruption, not to bound amplification, so it fires after the damage.

A check that runs during the copy is still required; there's no way to bound amplification after the fact.

ndossche · 2026-04-20T16:09:19Z

So what does this actually fix?
What's the attack model? If the attacker can control the phar file, which you assume here, it doesn't matter if they lie about the file size or not, they can make a legitimate phar file that will decompress to gigabytes of data, without the check firing.
NAK from me.

iliaal · 2026-04-20T16:20:51Z

Yes, would need have control of phar for this to matter, which as you said would be mighty risky already, hence this is not an advisory just a potential issue.

Given the proposed solution with potential negative performance implication of reading things in small (typical php sized buffers). Perhaps deeper fix inside php_stream_copy_to_stream_ex could fix by checking oddities like 1x100 decompression ratio which would be very odd, etc... but probably higher complexity fix.

That being said, still a concern given the amplification aspect and could in theory be chained with something else and essentially few bytes could cause some trouble. I do think this should be addressed, but I can see why a NAK (new for me)

ndossche · 2026-04-20T16:25:10Z

Perhaps deeper fix inside php_stream_copy_to_stream_ex could fix by checking oddities like 1x100 decompression ratio which would be very odd, etc... but probably higher complexity fix.

I agree that if we want to stop amplification attacks (which is difficult to do generally), a more general solution is desirable rather than doing this in many different places at once.

Adds an optional max_output parameter to zlib.inflate and bzip2.decompress filters. When set, the filter tracks bytes emitted and aborts with PSFS_ERR_FATAL once the cap is exceeded, stopping decompression amplification mid-stream instead of after the full payload has landed on disk. phar_open_entry_fp() passes entry->uncompressed_filesize as the cap so a lying phar (small declared size, payload that decompresses beyond it) fails during streaming. The previous post-copy filesize mismatch check is retained to catch under-decompression. The parameter is opt-in: omitting the key keeps existing behavior for all current callers.

github-actions bot added the Extension: phar label Apr 18, 2026

iliaal mentioned this pull request Apr 18, 2026

phar: cap decompression output against declared uncompressed_filesize iliaal/php-src#44

Closed

Girgias reviewed Apr 20, 2026

View reviewed changes

iliaal requested a review from Girgias April 20, 2026 15:27

iliaal force-pushed the fix/phar-decompression-bounds branch from facb875 to abb0784 Compare April 20, 2026 18:13

github-actions bot added Extension: bz2 Extension: zlib labels Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phar: cap decompression output against declared uncompressed_filesize#21806

phar: cap decompression output against declared uncompressed_filesize#21806
iliaal wants to merge 1 commit intophp:PHP-8.4from
iliaal:fix/phar-decompression-bounds

iliaal commented Apr 18, 2026

Uh oh!

Girgias Apr 20, 2026

Uh oh!

iliaal Apr 20, 2026

Uh oh!

Girgias Apr 20, 2026

Uh oh!

iliaal Apr 20, 2026

Uh oh!

ndossche commented Apr 20, 2026

Uh oh!

iliaal commented Apr 20, 2026

Uh oh!

ndossche commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iliaal commented Apr 18, 2026

Uh oh!

Girgias Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

iliaal Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Girgias Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

iliaal Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ndossche commented Apr 20, 2026

Uh oh!

iliaal commented Apr 20, 2026

Uh oh!

ndossche commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants